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of recognition accuracy. Three experiments were completed. The first showed that 
88.25% of 4305 trials of Chinese phoneme recognition was correctly recognized. The 
second showed that 74.67% of 900 trials of simulated speaker independent mode 
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I. BACKGROUND 



A. INTRODUCTION 

There were about forty speech recognition/input studies conducted at United 
States Naval Postgraduate School (NPS) during the past six years. The conclusion 
suggested by these studies is quite significant: that speech input, compared to 

conventional manual input, is much more accurate and faster. And, since hands are 
free from typing on the keyboard, users may be capable of performing a secondary 
assignment. From an early experiment conducted by Prof. Gary Poock in 1980, he 
concluded three results. (1) Manual input had 183.2% more entry errors. (2) Speech 
input was 17.5% faster. (3) Speech input allowed subjects to concurrently perform 
25% more on a secondary job. See [Ref. 1] for detailed information. 

Another highly valued finding is that speech input needs only a small amount of 
time to acquaint brand-new users with this input device, and results in a better 
performance than that of a well-trained operator who uses a keyboard as an input 
device. From the same experiment mentioned above, Prof. Poock found that the 
average time for the subjects to practice with the voice recognition equipment and feel 
ready to conduct the experiment was only 3.26 hours. This is much less than the time 
needed for familiarizing an individual with a keyboard device. 

The usage of English speech to input data to computer systems has proved to be 
technically and practically feasible. At the same time the range of potential military 
and commercial applications of this medium appears extensive. All of these 
encouraged the author to initiate this study and, hopefully, to provide some useful 
information for further research and future possible applications of Chinese speech 
recognition/input. 



B. THE LANGUAGE AND THE RECOGNITION 

The language used in most studies mentioned above was English. There was, in 
fact, only one experiment that examined a second language- German. As described in 
[Ref. 2], the recognition system functioned equally well when training and testing used 
German as an input language. The same study also examined the capability of the 
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recognition system (Threshold Technology T600 voice recognition system) to function 
in a bilingual mode. However, significant degradation was observed when training and 
testing was bilingual in nature. 

During one of his Man-Machine Interface laboratory projects, the author, under 
a programmed scenario, has successfully operated the DDN with Chinese speech. The 
DDN stands for Defense Data Network, a large distributed network of computers 
which are geographically located around the United States and other countries. From 
that preliminary experiment, the author has shown that Chinese speech can also be an 
effective input medium for command/control opertions. 



C. THE PURPOSE AND THE SCOPE 

Because of the imperfect phonetic system, Chinese speech has suffered a certain 
degree of difficulty. Due to the same reason, some confusion about the phonetic 
system has been raised during the past years. Although the difficulty itself will not 
influence the recognition of Chinese speech, the reasons that caused the difficulty will. 
In addition, all that confusion, if not clarified, will be the trouble area for Chinese 
speech recognition in the future. 

The main effort of this study is, then, to do a thorough study on Chinese speech 
and the corresponding phonetic system. A brief discussion is provided in Chapter III. 
The detailed discussion, provided in Chapter II, on the English part is mainly for 
establishing a reference basis for the later discussions of Chinese speech. A further 
experiment on examining Chinese speech recognition was conducted. The description 
of the experiment itself and the results obtained are provided in Chapter IV and V 
respectively. Some suggestions on further studies are also discussed in Chapter V. 



D. GENERAL INFORMATION ON THE STUDY 

The studies on the two languages within this thesis focused on the sounds of the 
languages. Hence, it is necessary to point out English used here means American 
English while the Chinese means Mandarin Chinese. The presentation of the speech 
sounds during the discussion will be Some selected letters quoted by special characters. 
To differentiate them, the author uses /..../ to present English pronunciations and 
<....> to present Chinese pronunciations. 
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The English phonetic system the author used is known as the KK Phonetic 
System established by two famous American linguists- Dr. John S. Kenyon and Dr. 
Thomas A. Knott. Their A Pronunciation Dictionary Of American English has been an 
international reference book for studying American English. The Chinese phonetic 
system the author used is the only system compiled by the Chinese Department of 
Education in 1918. The system is also known as <Droo In Foo Hao> in Chinese. 
Consult [Ref. 3] and [Ref. 4] for detailed description. 

The KK System was so well established that it fully complied with the rule of 
thumb for constructing a phonetic alphabet system: One symbol represents only one 
unique sound , and one sound only has one unique symbol on its behalf However, this is 
not the case in the Chinese phonetic system. There are symbols representing two or 
even three sounds, or two symbols actually representing the same sound. This is an 
important feature deserving special attention for those who want to apply the current 
Chinese phonetic system in Chinese speech recognition/input research. Further 
discussion will be provided in Chapter III. 



11 



II. AN EXAMINATION OF ENGLISH SPEECH 



A. THE SOUNDS OF ENGLISH 

According to the KK System, there are forty-one sounds used in English, which 
are called phonemes of English. Among them, seventeen are vowels and twenty-four 
are consonants. These forty-one sounds, depending on the way they are produced, 
have been sorted into ten groups. Each sound is associated with a unique phonetic 
alphabet formulated by the International Phonetic Association. (Consult Appendix A 
for more information on the original symbols used.) However, these phonetic 
alphabets are usually used only by linguists and therefore just several of them can be 
found on the NPS IBM 3800-3 printer system. For easing our discussion, the author 
constructed a symbol system to represent these forty-one sounds. Please see Table 1 
for the general idea. 

The phoneme is the smallest unit of significant distinctive sound. However, not 
all phonemes can form a syllable- the smallest unit of English words. To form a 
syllable, one and only one vowel sound is required as the base and may or may not be 
proceeded or followed by any consonant combinations. So /ei/, /bee/, /it/, /head/, and 
/spleen/ are all considered single syllable words. 

The most reliable way to discriminate phonemes is to first examine the manner 
and then the speech organs used to produce the speech. [Ref. 5] has provided intensive 
discussions on the production of each phoneme and can be a very helpful reference. 
Human hearing is a good enough tool to tell the differences among sounds, but it is 
not always reliable in trying to differentiate certain similar sound pairs such as /ee/ and 
IH, /oof and /o /, or /n / and / ng/. We can use [Ref. 6] as a valuable source to obtain 
detailed information on those sound pairs. 

Certain sounds may be recognized on one speech recognition system but not on 
another system. This is due to the algorithm design adopted by the recognizer 
manufacturers. Although it is beyond the scope of this study, it is proper to note that 
the algorithm of the recognizer has a dominant influence on the recognition 
performance. 
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TABLE 1 

PHONEMES OF AMERICAN SPEECH 



Front Vowels (FV): 


Back Vowels (BV): 


1. ee bee 

2. i bit 

3. ei eight 

4. ea head 

5. au laugh * 


1. oo room 

2. o woman 

3. oa coat 

4. aw law * 

5. a car 


Central Vowels (CV): 


Diphthongs (DI): 


1. er letter * 

2. ur hurt * 

3. e the 

4. u cut 


1. ai aisle 

2. ow now * 

3. oy boy * 


Fricatives (FR): 


Nasals (NA): 


1. f. five 

2. v verv * 

3. th think * 

4. the bathe * 

5. s six 

6. z zoo * 

7. sh she * 

8. ge .garage * 

9. fi fim 5 


1. m make 

2. n nice 

3. ng king 

Glides (GL): 

1. y vear 

2. w .'wait 

3. r right * 


Stops (ST): - 


Affricates (AF): 


1 . p pool 

2. o but 

3. t tea 

4. d do 

5. k kiss 

6. g give 


1 . ch chip 

2. j...*.....jov 

Lateral (LA): 
1 . 1 lay 



* sounds not used in Chinese. 



B. THE PRODUCTION OF SPEECH SOUNDS 

The production of vowels is primarily done by adjusting the shape and size of the 
oral cavity, the main resonance chamber. Such adjustments are made by altering the 
position of the tongue, jaw and lips. The vocal tract 1 , during speech production, 
remains relatively open and unobstructed. The production of consonants is done by 
adopting certain articulatory motions to produce different types of sounds. Therefore, 



1 Vocal tract is the area throush which the breath stream passes during the 
production of the sounds. 



13 




we may discuss consonants by examining the place 2 of articulation and the manner 3 of 
articulation used to produce the sounds. During consonant production, some kind of 
obstruction of the vocal tract is observed. 

In Table 1, some phonetic terminologies are being used. From these 
terminologies, one can easily obtain some information about the production of each 
category of English speech. Here is a brief introduction to these terminologies. More 
detailed information can be found in [Ref. 5.] 

Front Vowel is a vowel which is pronounced with the front part of the tongue 
higher than the rest of the tongue. Front Vowel is also called Spread Vowel because it 
is also pronounced with the lips spread. Back Vowel is a vowel which is pronounced 
with the back part of the tongue higher than rest of the tongue. Back Vowel is also 
called Rounded Vowel because, of course, it is pronounced with the lips rounded. 
Central Vowel, then, is a vowel which is pronounced with the middle part of the 
tongue higher than the front or back of the tongue. The shape of the lips for Central 
Vowels is, as you can imagine, somewhat between spread and rounded. 

All three categories of vowels mentioned above are considered single vowels. 
Diphthongs are sounds that appear to be formed from the blend of two single vowels 
spoken together in the same syllable. What actually happens here is that the 
articulator begins the syllable in the position for one vowel and then shifts with a 
smooth and continuous transition movement toward the position for some other vowel. 
One can easily learn to detect the first and second vowels of the diphthongs. 

Fricative is a consonant- consisting acoustically of friction noises. They are made 
by directing the breath stream with adequate pressure against one or more points of 
articulation and lead to the hissing noises of distinctive Fricatives. Stop is a speech 
sound which involves a complete blocking of the breath stream at some point and is 
subsequently released with a somew’hat audible explosive puff. That is wiiy Stop is 
sometimes also called Explosive. Nasal is chosen for the class because of the 
distinctive nasal resonance that those sounds uniquely contain. Glide is a consonant 
that consists primarily of the movement of an articulator which causes a rapid change 
of resonance. Glide is also called Semivowel, because the starting position of 
pronouncing each of them is a vow'el. They are /ee/ for /y/, /oo/ for /w/ and /ur / for /r/. 



2 Place of articulation includes bilabial, labiodental, linguadental, lingua-alveolar, 
linguapalatal, linguavelar and glottal. 

3 Manner of articulation includes nasal, stop, fricative, affricate, lateral and glide. 
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Usually the tongue moves from the position of each vowel to that for the following 
vowel in the same syllable. The sounds produced by the articulator movement between 
the two vowels are represented by each Glide respectively. Affricate is a consonant 
that is made up of two consonants- a Stop followed by a Fricative. Lateral is 
produced in a manner that the voiced breath stream escapes laterally over the sides of 
the tongue. 



C. THE PITCH AND INTONATION OF ENGLISH 

When you read an English word or a sentence composed of several words, your 
sound flow actually contains different pitches. Although each word has its unique 
pitch pattern in English, it has some variations when the same word is read with other 
words in a sentence. We use intonation as a term for the latter concept. 

English has been described as using four pitch levels. They are extra-high, high, 
mid. and low. To simplify, numbers have been used to designate them. George L. 
Trager and Henry L. Smith, Jr., in their An Outline of English Structure, chose 1 to 
represent low. As the pitch level rises, the representation also increases in number. In 
normal speech, however, extra-high designated by 4 does not occur often. Extra-high 
usually indicates excitement. 

Since pitch is determined by the frequency of the sound, the pitch level is, from 
the 'viewpoint of linguists, really a relative matter. There is no need to tell the 
difference between the pitches of the same syllable produced by two persons. Similarly, 
the attempt to tell the difference between the pitches of the same syllable produced by 
the same person at different moments is also meaningless. However, there are indeed 
certain rules regarding pitch which must be observed in order to generate 
understandable English words. These rules are as follow: 

1. The principal stressed syllable of a word will be pronounced with 
high pitch (designated by 3). 

2. All the syllables produced before the principal stressed syllable 
will be pronounced with mid pitch (designated by 2). 

3. All the syllables produced after the principal stressed syllable 
will be pronounced with low pitch (designated by 1). 

4. When the principal stressed syllable is the last syllable of a 
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word, the vowel sound of the syllable will present a 3-to-l 
falling inflection of pitch. 

5. An auxiliary stressed syllable will act similar to a principal one 
and the only difference is that its pitch level will be located 
between high and mid. 

Some examples are provided in Figure 2.1, which apply those rules mentioned 
above. Again, one should keep in mind that the pitch relationship among syllables of a 
word is relative. As you can see, the first three examples are presented with an order 
that the principal stressed syllable appears at first, second, and then the last syllable of 
each word respectively. The last one is an example of a single-syllable word that will 
be pronounced like the last syllable of the third example. When a word with an 
auxiliary stressed syllable is encountered, you just insert that syllable into a level 
between 3 and 2, and pronounce it with a pitch higher than the mid pitch syllable but 
lower than the high pitch syllable of the word. 



pitch 

level 

3 Mi pe fe 

2 pros pre e 

1 chigan rity er 



fro 

o 

• om 



Figure 2.1 Examples of English Pitch Patterns. 

The 3-level pitch system can also be applied in discussing intonation, where the 
whole sentence is put into a pitch frame having a wider frequency range for each level. 
To obtain the idea, see examples in Figure 2.2. 

The first example represents the most common and colorless intonation pattern 
in English, which is designated with number 231. Simple statements and questions 
starting with question words always use this pattern. The second intonation pattern is 
used by what we called 'yes/no questions', and is designated with number 233. The last 
one is an example to show a simple statement colored by extra meaning, and is 
designated with number 223. Interested readers may consult [Ref. 7] for a complete 
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pitch 

level 

3 Mi 

2 He is from chi 
1 gan. 



Michigan? 

Is he from 



chigan? 

He is from Mi 



Figure 2.2 Examples of English sentences intonation. 

discussion on this subject. The main point the author wants to address here is that the 
pitch pattern of an English word may change depending on how/where it appears 
within a sentence. 
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III. AN EXAMINATION OF CHINESE SPEECH 



A. THE SOUNDS OF CHINESE 

The original Chinese phonetic system had 41 symbols. However, the current 
system used only has 37 symbols. Four symbols were deleted. Two of them, exist in 
English as well, represent the sounds /ng/ and /v/. The reason for the deletions, 
however, was different. The symbol representing the sound /ng/ was deleted because 
the system had another symbol also representing the sound. The latter was simply 
because the Chinese does not have the speech sound /v/. The third one was a Nasal 
sound produced with tongue-front pushed against the hard palate, which does not exist 
in either English or Chinese. The fourth symbol, representing two very similar Front 
Vowels of Chinese , was deleted for, probably, the following two reasons. First, they 
are not able, as other finals, to form a syllable by themselves. They must follow a 
particular Fricative. Second, the articulation places of the two sounds are the same as 
that of the Fricatives which proceed them. This deletion causes Chinese characters to 
sound sometimes as being represented by a single consonant. As a remedy, the author 
uses < ih> to represent the two sounds and which will be shown as the 38th symbol of 
Table 2 . 

Given the historical information mentioned above, the author constructed a 
38-symbol table for the Chinese phonetic system, which actually can be seen as a 
romanization system. Appendix B has provided a table that simultaneously presented 
several current existing romanization systems, namely, Yale(YL), Wade-Giles(WG), 
Chinese Phonetic System Second Form(SF), PinYin(PY), and the system suggested by 
author (SG), for purposes of cross reference. The order of the symbols in Table 2 is 
exatly the same as that of the existing phonetic system. The first 21 symbols are 
consonants, also called initials, and the succeeding 17 symbols are vowels or 
combinations of a vowel and a Nasal, also called finals. The reason for the alias is due 
to the features of Chinese pronunciation. Chinese characters are always single- syllable 
sounds. They usually are an initial followed by a final, an initial and a Glide then 
followed by a final or just a final itself. In most situations, the characters end with a 
vowel sound. The only two consonants allowed to be produced at the end of a 
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character are sounds /n/ and /ng/. Although /n / is also one of the 21 initials, the 
Chinese phonetic system has another symbol to represent the /n/ sound that appears at 
the end of a character. Hence, those 21 consonants will always be the initial part of a 
character sound. 



TABLE 2 

CHINESE PHONETIC SYSTEM 



Initials: 



Bilabials 


Glottal 


1. b ST 


11. h FR 


2. d ST 


3. m N'A 


Lingua-palatals 


Labiodental 


12. j AF 

13. ch....AF 


4. f.....FR 


14. hs....FR * 


Lingua-alveolars 


15. dr....AF * 

16. tsh...AF * 


5. d ST 


17. sh....FR * 


6. t ST 


IS. r FR * 


7. n NA 


♦ 


S. 1 LA 


Lingua-alveolars 


Lingua-velars 


19. dz....AF + 

20. ts....AF + 


9. 2 ST 


21. s FR 


10. K. ST 


• 


Finals: 


Single Vowels 


Combinations 


22. a BV + 


30. an * 


23. o BV 


31. en + 


24. e CV 


32. ang * 


25. ea....FV 


33. eng + 


Diphthongs 


Single Vowels 


26. ai....DI 


34. er....FV * 


27. ei....FV 


35. i FV 4- 


28. ao....DI * 


36. oo.... BV 4- 


29. oa....BV 


37. iu....FV * 

38. ih....FV * 



* sounds not used in English/further disscussion provided 
+ further discussion provided 
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A quick, look Table 2 shows that the consonants of the Chinese phonetic system 
are grouped by the articulation organs used to make each sound. The reason for this 
was mentioned by Prof. Francis Dow in his work [Ref. 8: p.24], and quoted below: 

1. The consonants of each category have their homorganic nature 
in articulation. 

2. In the constitution of syllables, certain sets of initials occur 
before certain sets of finals (consult Table 5). 

3. It is more convenient to compare the consonants of each category 
with those in other Chinese dialects. 

In Table 2, twenty symbols followed by neither ' + ' nor are sounds also used 
in English. The author selected exactly the same symbols shown in Table 1 to 
represent them respectively. Seven symbols followed by a ' + ' are sounds also used in 
English, but some details need to be clarified. Eleven Symbols followed by are 
sounds not used in English; therefore, a brief introduction is provided for each of them. 
The following two sections provide detailed discussions on this. 



B. CLARIFICATION OF CONFUSIONS IN THE CHINESE PHONETIC 

SYSTEM 

The 20th and 19th symbols represent a pair 4 of affricates. Sound <ts> is 
voiceless and <dz> is the voiced counterpart of <ts>. They appear, in English, at 
the end of the plural form of nouns with ending sound / 1/ or /d/ respectively such as 
hats and hands. 

The 22nd symbol, < a > , represents three different sounds. All of them are used 
in English, but only two are considered phonemes. The first sound is /a/ of car and the 
second sound is / u/ of cut. The third one is the first half of diphthongs /ai/ and / ow/; 
however, it is, in Chinese, the most frequently used sound among the three. The 
author suggests using <aa> to represent this, since the lips, when producing the 
sound, are spread wider than when producing sound /a/. And the symbols for the 
remaining two sounds are, as their English counterparts, <a> and <u> . 



4 Two sounds are considered a pair when they adopt the same method and use 
the same articulator and point of articulation for pronunciation. The only difference is 
that one is voiceless and the other is voiced sound. 
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The symbol <a> has already caused an unrecoverable damage in Chinese. No 
one, at present time, is able to tell, when encountering a character with symbol < a> , 
which one of the three sounds should be used. Words in Chinese such as mother, 
<ma ma>, lama, <la ma>, or to punch a card, <da ka>, should actually be 
pronounced, from the author's limited-scale investigation, as <mu mu>, < laa mu> 
and < daa ka>. Since the situation is messed up already, no one ever has the 
authority to say which one of the three sounds should be the right sound for certain 
characters. A further wide-range investigation is needed if one is really anxious to use 
the right sound for characters with symbol <a>. And, probably, the end product of 
the investigation would only be the majority-used sounds of the general population in 
this age. However, since it is beyond the scope of this thesis, the author leaves the 
problem to future researchers. For the purpose of simplifying the following discussion, 
the author will, from now on, use only < a> to represent the three sounds. 

Both the 31st and the 33rd symbols represent two different sounds. They 
represent sounds /n / and / ng/ respectively in some cases and /e/ followed by ,'n / or by 
/ng/ in some other cases. Although many people are confused by these two symbols, a 
careful study certainly helps to differentiate the usages of them. Symbol <en>, in 
most situations, represents sound /e + n/, except when appearing after the symbols 
<i> and <iu>. In the latter case, the <en> represents sound /n /. Symbol 
<eng> , the same as <en> , represents the sound /e + ng/ most of the time, but when 
appearing after symbol <i>, <oo> or < iu> , it represents sound /ng/ as well. See 
Table 6 for some examples. 

Again, the 35th symbol, <i>, represents three sounds which are also. used in 
English. They are /i / and /ee/ of Front Vowels and /y/ of Glides. To tell when <i> 
representing sound fy( is easy, because once one notes an <i> appearing before a 
final, he is almost sure that the symbol <i> represents sound /y/. However, the finals 
<en> and <eng> are two exceptions. In this situation, the symbol <i> represents 
sound i'\! or /ee/; with the two finals becoming consonants In/ and /ng/. 

In the case of telling whether /i/ or /ee/ is represented by symbol <i> for a 
certain character, one faces the same problem discussed earlier. It is again an 
unrecoverable damage which was caused many years ago. Secret, as an example, in 
Chinese symboled by <mi mi> should in fact be pronounced as < mee mi>. The 
author, for the same reason, leaves the problem to researchers for further study and, 
uses the symbol <i> to represent the two sounds through the following discussions. 
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Although the 36th symbol represents two sounds also used in English, we can 
easily tell them apart by examing the usages of the symbol. The two sounds 
represented by the symbol are /oo/ of Back Vowels and /w/ of Glides. Once an < oo> 
is found before a final, for most situations, we know that it is sound /w/. However, the 
final <eng> is the only exception. In this case and in the case that the <oo> itself 
is the final part of a character, we know that the sound /oo/ is represented. 



TABLE 3 

SUPPLEMENTARY TO THE CHINESE PHONETIC SYSTEM 



NO 


Original 


Suggested 


Articulation 


22 


<a> 


< a> 


BV 






<u> 


CV 






< aa> 


CV 


31 


< en> 


< en> 


CV + NA 






< n> 


NA 


33 


< eng> 


< eng> 

< ng7> 


CV + NA 
NA 


35 


< i> 


<i> 


FV 






< ee> 


FV 






<y> 


GL 


36 


< oo> 


< 00 > 


BV 






< w> 


GL 


37 


< iu> 


< iu> 


FV 






A 

l 

V 


GL 



Table 3 provides a summary of this section, which lists all the symbols that are 
easily confused. The first column of the table is the number of each symbol, which 
corresponds to the number appearing in Table 2 . The second column lists all the 
symbols, except 19 and 20, discussed in this section. Symbol 19 and 20 are not 
included because they are not confused at all. Symbol 37 is listed here too, but the 
discussion is provided in the next section, because it is a sound existing uniquely in 
Chinese. The third column is the author's suggestions that each symbol should 
actually be according to the discussions provided in this chapter. The last column 
provides articulation information on each symbol. Consult Table 1 and the discussions 
provided in Chapter II for a better understanding of the abbreviations used here. 
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C. INTRODUCTION TO UNIQUELY EXISTING SOUNDS IN CHINESE 

The 14th sound of the Chinese phonetic system, <hs>, is a Fricative. To 
produce the sound, one needs to raise his or her tongue-front toward, but does not 
touch the hard palate, and let the tongue-tip stretch down against the lower teeth 
ridge. With the tongue held in this position, an unvoiced breath stream is directed 
against the hard palate, lower teeth ridge and teeth to produce the sound < hs> . 

The 17th and 18th symbols, <sh> and <r> , represent a pair of Fricatives also. 
These two sounds do not appear in English, but they have some similarities to the 
sound pair / sh/ and / ge/ in English. The author directly 'borrowed' the symbols from 
English for reasons mentioned below. 

The only difference between /sh/ and <sh> is the articulators used by the two 
sounds. The /sh/ sound requires raising the tongue-mid toward the hard palate; while 
the <sh> sound uses tongue-front to stretch toward the hard palate. Everything else 
is the same. 

Just as /ge/ to the /sh/, the sound <r> is the voiced counterpart of the <sh>. 
The reasons we do not use <ge> is that /r/ also has some similarities to the sound 
<r> and /r / appears more as an initial which is exactly the characteristic that <r> 
has. The way to produce the sound <r> is the same as producing <sh> except 
adding the vibration of the vocal cords, which is the main feature of voiced sounds. 

The 16th and 15th symbols, <tsh> and <dr>, are a pair of Affricates. Their 
relationship with rhe sound pair of <sh> and <r> is just like that of /ch/ and /j/ to 
the sound pair /sh / and /ge/ in English. That is why the author selected <t + sh> 
and <d + r> to represent the two sounds respectively. And, the way to produce the 
sound <tsh> is just as the symbol itself suggests: do a preparation action as if you 
were going to produce a <t> sound. When ready, actually produce an <sh> sound 
instead. It is similar to producing the English sound /ch / except using a different 
articulator. To produce <dr> is the same as producing <tsh> except adding a 
vibration of the vocal cords since <dr> is its voiced counterpart. 

Although the 28th sound, < ao > , does not appear in English, there is indeed a 
very similar sound in English. That is /ow /. The only difference between the two 
sounds is the first half starting position of the sound. The sound /ow / is a diphthong 
formed by blending <aa> and <o> together; however, the sound <ao> is 
produced by blending <a> and <o> together. A careful examination of the lips' 
shape can certainly help to distinguish the two sounds, <a> and <aa> , without any 
difficulty. 
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